Update type-parser for v2 #1514

jpivarski · 2022-06-22T22:44:58Z

A more proper fix than #1513. (See continuing issues in scikit-hep/uproot5/pull/620.)

codecov · 2022-06-23T00:08:26Z

Codecov Report

Merging #1514 (642e6b9) into main (0c6ae03) will decrease coverage by 2.66%.
The diff coverage is 84.89%.

Impacted Files	Coverage Δ
...awkward/_v2/_connect/rdataframe/from_rdataframe.py	`0.00% <0.00%> (ø)`
...c/awkward/_v2/_connect/rdataframe/to_rdataframe.py	`0.00% <0.00%> (ø)`
src/awkward/_v2/types/_awkward_datashape_parser.py	`47.66% <ø> (ø)`
src/awkward/_v2/_connect/cling.py	`25.99% <9.09%> (-0.35%)`	⬇️
src/awkward/_v2/types/recordtype.py	`87.06% <91.66%> (+0.83%)`	⬆️
src/awkward/_v2/types/type.py	`96.46% <96.10%> (-1.27%)`	⬇️
src/awkward/_v2/_lookup.py	`97.51% <100.00%> (+0.01%)`	⬆️
src/awkward/_v2/types/__init__.py	`100.00% <100.00%> (ø)`
src/awkward/_v2/types/regulartype.py	`90.47% <0.00%> (+4.76%)`	⬆️

jpivarski · 2022-06-23T15:18:23Z

The two tests that are failing are failing in a dtype merger test:

  _______________________________ test_numpyarray ________________________________
  
      def test_numpyarray():
          for dtype1 in ("i1", "i2", "i4", "i8", "u1", "u2", "u4", "u8", "f4", "f8", "?"):
              for dtype2 in ("i1", "i2", "i4", "i8", "u1", "u2", "u4", "u8", "f4", "f8", "?"):
                  for dtype3 in (
                      "i1",
                      "i2",
                      "i4",
                      "i8",
                      "u1",
                      "u2",
                      "u4",
                      "u8",
                      "f4",
                      "f8",
                      "?",
                  ):
                      for dtype4 in (
                          "i1",
                          "i2",
                          "i4",
                          "i8",
                          "u1",
                          "u2",
                          "u4",
                          "u8",
                          "f4",
                          "f8",
                          "?",
                      ):
                          one = np.array([0, 1, 2], dtype=dtype1)
                          two = np.array([3, 0], dtype=dtype2)
                          three = np.array([], dtype=dtype3)
                          four = np.array([4, 5, 0, 6, 7], dtype=dtype4)
                          combined = np.concatenate([one, two, three, four])
  
                          ak_combined = ak.layout.NumpyArray(one).mergemany(
                              [
                                  ak.layout.NumpyArray(two),
                                  ak.layout.NumpyArray(three),
                                  ak.layout.NumpyArray(four),
                              ]
                          )
  
                          assert ak.to_list(ak_combined) == combined.tolist()
  >                       assert ak.to_numpy(ak_combined).dtype == combined.dtype
  E                       assert dtype('float64') == dtype('float32')
  E                        +  where dtype('float64') = array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.]).dtype
  E                        +    where array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.]) = <function to_numpy at 0x7f997209cee0>(<NumpyArray format="d" shape="10" data="0 1 2 3 0 4 5 0 6 7" at="0x7f997322b980"/>)
  E                        +      where <function to_numpy at 0x7f997209cee0> = ak.to_numpy
  E                        +  and   dtype('float32') = array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.], dtype=float32).dtype
  
  ak_combined = <NumpyArray format="d" shape="10" data="0 1 2 3 0 4 5 0 6 7" at="0x7f997322b980"/>
  combined   = array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.], dtype=float32)
  dtype1     = 'i1'
  dtype2     = 'i1'
  dtype3     = 'u2'
  dtype4     = 'f4'
  four       = array([4., 5., 0., 6., 7.], dtype=float32)
  one        = array([0, 1, 2], dtype=int8)
  three      = array([], dtype=uint16)
  two        = array([3, 0], dtype=int8)
  
  work/awkward/awkward/tests/test_0449-merge-many-arrays-in-one-pass.py:52: AssertionError

They use numpy-1.23.0-cp38-cp38-macosx_10_9_x86_64.whl and numpy-1.23.0-cp39-cp39-win_amd64.whl, respectively, but other tests run NumPy 1.23.

In particular, the latest Azure MacOS runs numpy-1.23.0-cp310-cp310-macosx_10_9_x86_64.whl and the latest Azure Windows runs numpy-1.23.0-cp310-cp310-win_amd64.whl.

Hmmm. The test failure is in v1, which has to manually duplicate what NumPy does. v2 probably uses NumPy's own dtype promotion algorithm directly (i.e. concatenate two empty arrays of the appropriate dtype and see what the result is). Actually, it's not even that: it directly asks NumPy to concatenate them:

awkward/src/awkward/_v2/contents/numpyarray.py

Lines 503 to 505 in 2b6252d

    
           contiguous_arrays = self._nplike.concatenate(contiguous_arrays) 
        
           next = NumpyArray(contiguous_arrays, self._identifier, parameters, self._nplike)

So even if NumPy changes their promotion rules, v2 should be insensitive to that change (it should change along with them). But those actual tests are commented out!

awkward/tests/v2/test_0449-merge-many-arrays-in-one-pass.py

Lines 53 to 93 in 2b6252d

    
           assert to_list(ak_combined) == combined.tolist() 
        
           assert ak_combined.dtype == combined.dtype 
        
           # assert ak._v2.contents.NumpyArray(one).typetracer.mergemany( 
        
           #     [ 
        
           #         ak._v2.contents.NumpyArray(two), 
        
           #         ak._v2.contents.NumpyArray(three), 
        
           #         ak._v2.contents.NumpyArray(four), 
        
           #     ] 
        
           # ).form == ak._v2.contents.NumpyArray(one).mergemany( 
        
           #     [ 
        
           #         ak._v2.contents.NumpyArray(two), 
        
           #         ak._v2.contents.NumpyArray(three), 
        
           #         ak._v2.contents.NumpyArray(four), 
        
           #     ] 
        
           # ).form 
        
           ak_combined = ak._v2.contents.NumpyArray(one).mergemany( 
        
               [ 
        
                   ak._v2.contents.NumpyArray(two), 
        
                   ak._v2.contents.EmptyArray(), 
        
                   ak._v2.contents.NumpyArray(four), 
        
               ] 
        
           ) 
        
           assert to_list(ak_combined) == combined.tolist() 
        
           assert ak_combined.dtype == np.concatenate([one, two, four]).dtype 
        
           # assert ak._v2.contents.NumpyArray(one).typetracer.mergemany( 
        
           #     [ 
        
           #         ak._v2.contents.NumpyArray(two), 
        
           #         ak._v2.contents.EmptyArray(), 
        
           #         ak._v2.contents.NumpyArray(four), 
        
           #     ] 
        
           # ).form == ak._v2.contents.NumpyArray(one).mergemany( 
        
           #     [ 
        
           #         ak._v2.contents.NumpyArray(two), 
        
           #         ak._v2.contents.EmptyArray(), 
        
           #         ak._v2.contents.NumpyArray(four), 
        
           #     ] 
        
           # ).form

I should do another PR to uncomment those and make sure they work. I don't care too much about whether v1 only agrees with a range of NumPy versions, it's about to become irrelevant soon. So that PR might drop the v1 test, while enabling the v2 test.

jpivarski added 2 commits June 22, 2022 17:43

New grammar for v2 type strings; start with JSON again.

2bcd44b

Type.parameter should not return None.

99c8378

jpivarski added 7 commits June 22, 2022 19:19

Developing type parser: through OptionType.

c16c9e2

Fix empty record type string.

f0d9970

Named record parsing is working.

1e3c2f4

Restrictions on record name in the type string.

ffc2102

Parsing records works.

a3c0081

Prevent record type string from using reserved words in name.

b30ce29

Tests are done; this is a working v2 type parser.

65bf4ad

jpivarski marked this pull request as ready for review June 23, 2022 03:31

jpivarski added 3 commits June 23, 2022 14:58

Merge branch 'main' into jpivarski/update-type-parser-for-v2

8ac0c8e

Add a standalone v2 types parser to the codebase.

a3504a1

New v2 type-parser is done.

48d09ab

jpivarski mentioned this pull request Jun 23, 2022

Swap 'merged dtype same as NumPy' test of v1 for test of v2. #1515

Merged

Ignore pylint errors for the auto-generated code, too.

642e6b9

jpivarski merged commit 4ba1dbb into main Jun 23, 2022

jpivarski deleted the jpivarski/update-type-parser-for-v2 branch June 23, 2022 22:34

jpivarski mentioned this pull request Jan 9, 2024

chore: restore files that generated the type-parser #2926

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update type-parser for v2 #1514

Update type-parser for v2 #1514

jpivarski commented Jun 22, 2022

codecov bot commented Jun 23, 2022 •

edited

Loading

jpivarski commented Jun 23, 2022

Update type-parser for v2 #1514

Update type-parser for v2 #1514

Conversation

jpivarski commented Jun 22, 2022

codecov bot commented Jun 23, 2022 • edited Loading

Codecov Report

jpivarski commented Jun 23, 2022

codecov bot commented Jun 23, 2022 •

edited

Loading